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ABSTRACT 

Small- and intermediate-scale galaxy clustering can be used to establish the galaxy-halo con¬ 
nection to study galaxy formation and evolution and to tighten constraints on cosmological 
parameters. With the increasing precision of galaxy clustering measurements from ongoing 
and forthcoming large galaxy surveys, accurate models are required to interpret the data and 
extract relevant information. We introduce a method based on high-resolution A^-body simu¬ 
lations to accurately and efficiently model the galaxy two-point correlation functions (2PCFs) 
in projected and redshift spaces. The basic idea is to tabulate all information of haloes in the 
simulations necessary for computing the galaxy 2PCFs within the framework of halo occupa¬ 
tion distribution or conditional luminosity function. It is equivalent to populating galaxies to 
dark matter haloes and using the mock 2PCF measurements as the model predictions. Besides 
the accurate 2PCF calculations, the method is also fast and therefore enables an efficient 
exploration of the parameter space. As an example of the method, we decompose the redshift- 
space galaxy 2PCF into different components based on the type of galaxy pairs and show the 
redshift-space distortion effect in each component. The generalizations and limitations of the 
method are discussed. 

Key words: cosmology: observations - cosmology: theory - galaxies: clustering - galax¬ 
ies: distances and redshifts - galaxies: haloes - galaxies: statistics - large-scale structure of 
Universe 


1 INTRODUCTION 

Over the past two decades, large galaxy redshift surveys, such as 
the Sloan Digital Sky Survey (SDSS; York et al. 2000), the Two- 
Degree Field Galaxy Redshift Survey (2dFGRS; Colless 1999), the 
SDSS-III (Eisenstein et al. 2011), and the WiggleZ Dark Energy 
Survey (Blake et al. 2011), have enabled us to study in detail the 
large-scale structure of the universe probed by galaxies. Galaxy 
clustering has become a powerful tool to study galaxy formation 
and evolution and to learn about cosmology. An informative way to 
interpret galaxy clustering is to link galaxies to the underlying dark 
matter halo population, whose formation and evolution are dom¬ 
inated by gravitational interaction and whose properties are well 
understood with analytic models and A-body simulations. 

The commonly adopted descriptions of the connection be¬ 
tween galaxies and dark matter haloes include the halo occupation 
distribution (FIOD; e.g. Jing et al. 1998; Peacock & Smith 2000; 
Seljak 2000; Scoccimarro et al. 2001; Berlind & Weinberg 2002; 
Berlind et al. 2003; Zheng et al. 2005) and the conditional luminos¬ 
ity function (CLF; e.g. Yang et al. 2003). The former specifies the 
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probability distribution of the number of galaxies in a given sample 
as a function of halo mass, together with the spatial and velocity 
distribution of galaxies inside haloes. The latter specifies the lumi¬ 
nosity distribution of galaxies as a function of halo mass. Given 
a set of HOD or CLF parameters, with the halo population for a 
given cosmological model, galaxy clustering statistics can be pre¬ 
dicted. Such frameworks have been successfully applied to galaxy 
clustering data to infer the connection between galaxy properties 
and halo mass (see e.g. van den Bosch et al. 2003; Zehavi et al. 
2005, 2011 ; Zheng et al. 2007, 2009; Guo et al. 2014; Skibba et al. 
2015) and to constrain cosmology (e.g. van den Bosch et al. 2003; 
Tinker et al. 2005; Cacciato et al. 2013; Reid et al. 2014). In partic¬ 
ular, the main clustering statistic used is the two-point correlation 
function (2PCF) of galaxies, which is the focus of this paper as 
well. 

Halo properties, like their mass function and spatial cluster¬ 
ing (bias), can be understood analytically (e.g. Press & Schechter 
1974; Mo et al. 1996; Sheth & Tormen 1999), and A-body simu¬ 
lations also enable accurate fitting formulae to be obtained (e.g. 
Jenkins et al. 2001; Tinker et al. 2008, 2010). Based on these, ana¬ 
lytic models of galaxy 2PCF can be developed. The basic idea is to 
decompose the 2PCF into contributions from intra-halo and inter- 
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halo galaxy pairs. The intra-halo component, or the one-halo term, 
represents the highly nonlinear part of the 2PCF. The inter-halo 
component, or the two-halo term, can be largely modelled by linear 
theory. Such analytic models have the advantage of being compu¬ 
tationally inexpensive, and they can be used to efficiently probe the 
HOD/CLF and cosmology parameter space. However, as the preci¬ 
sion of the 2PCF measurements in large galaxy surveys continues 
to improve, the requirement on the accuracy of the analytic mod¬ 
els becomes more and more demanding. As pointed out in Zheng 
(2004a), an accurate model of the galaxy 2PCF needs to incorporate 
the nonlinear growth of the matter power spectrum (e.g. Smith et al. 
2003), the halo exclusion effect, and the scale-dependent halo bias. 
In addition, the non-spherical shape of haloes should also be ac¬ 
counted for (e.g. Tinker etal. 2005; van den Bosch et al. 2013). 
These are just factors to be taken into account in computing the 
real-space or projected 2PCFs. For redshift-space 2PCFs, more fac¬ 
tors come into play. An accurate analytical description of the ve¬ 
locity field of dark matter haloes in the nonlinear or weakly non¬ 
linear regime proves to be difficult and complex (e.g. Tinker 2007; 
Reid & White 2011; Zu & Weinberg 2013). Therefore, an accurate 
analytic model of redshift-space 2PCFs on small and intermediate 
scales is still not within reach. 

The above complications faced by analytic models can all be 
avoided or greatly reduced if the 2PCF calculation is directly done 
with the outputs of A^-body simulations. With the simulation, dark 
matter haloes can be identified, and their properties (mass, veloc¬ 
ity, etc) can be obtained. For a given set of HOD/CLF parameters, 
one can populate haloes with galaxies accordingly (e.g. using dark 
matter particles as tracers) and form a mock galaxy catalog. The 
2PCFs measured from the mock catalog are then the model pre¬ 
dictions used to model the measurements from observations. Such 
a method of directly populating simulations have been developed 
and applied to model galaxy clustering data (e.g. White et al. 2011; 
Parejko et al. 2013). This simulation-based model is attractive, as 
more and more large high-resolution W-body simulations emerge. 
It is also straightforward to implement. Once the mock catalog is 
produced, measuring the 2PCFs can be made fast (e.g. with tree 
code). However, populating haloes with a given set of HOD/CLF 
parameters is probably the most time-consuming step, as one needs 
to loop over all haloes of interest. In addition, information of indi¬ 
vidual haloes and tracer particles is needed, like their positions and 
velocities. Even with only a subset of all the particles in a high- 
resolution simulation, the amount of data can still be substantial. 

The purpose of this paper is to introduce a method that takes 
the advantage of the simulation-based model, but being much more 
efficient in modelling galaxy clustering. The main idea is to decom¬ 
pose the galaxy 2PCFs and compress the information in the simu¬ 
lation by tabulating relevant clustering-related quantities of dark 
matter haloes. We also apply a similar idea to extend the com¬ 
monly used sub-halo abundance matching method (SHAM; e.g. 
Conroy et al. 2006). 

The paper is structured as follows. In Section 2, we formu¬ 
late the method, within the HOD/CLF-like framework and within 
the halo/sub-halo framework. In Section 3, we show an example 
of modelling redshift-space 2PCFs, which also provides an under¬ 
standing of the three-dimensional (3D) small- and intermediate- 
scale galaxy redshift-space 2PCF and its multipoles by decompos¬ 
ing them into the various components. In Section 4, we summarize 
the method and discuss possible generalizations and limitations. 


2 SIMULATION-BASED METHOD OE CALCULATING 
GALAXY 2PCES 

In our simulation-based method, we divide haloes identified in A^- 
body simulations into narrow bins of a given property, which de¬ 
termines galaxy occupancy. In the commonly used HOD/CLF, the 
property is the halo mass. In our presentation, we use halo mass as 
the halo variable, but the method can be generalized to any set of 
halo properties. 

The basic idea of the method is to decompose the galaxy 
2PCF into contributions from haloes of different masses, from one- 
halo and two-halo terms, and from different types of galaxy pairs 
(e.g. central-central, central-satellite, and satellite-satellite pairs). 
The decomposition also allows the separation between the halo 
occupation and halo clustering. The former relies on the specific 
HOD/CLF parameterization, while the latter can be calculated from 
the simulation. The method is to tabulate all relevant information 
about the latter for efficient calculation of galaxy 2PCFs and explo¬ 
ration of the HOD/CLF parameter space. 

We first formulate the method in the HOD/CLF framework. 
We then apply the similar idea to the SHAM case, which provides 
a more general SHAM method. 


2.1 Case with Simulation Particles 

Let us start with a given A^-body simulation and a given set of 
HOD/CLF parameters. To populate galaxies into a halo identified in 
the simulation, we can put one galaxy at the halo ‘centre’ as a cen¬ 
tral galaxy, according to the probability specified by the HOD/CLF 
parameters. Halo ‘centre’ should be defined to reflect galaxy for¬ 
mation physics. For example, a sensible choice is the position of 
potential minimum rather than centre of mass. For satellites, we 
can choose particles as tracers. In the usually adopted models, it is 
assumed that satellite galaxies follow dark matter particles inside 
haloes (e.g. Zheng 2004a; Tinker et al. 2005; van den Bosch et al. 
2013), rooted in theoretical basis (e.g. Nagai & Kravtsov 2005). 
One can certainly modify the distribution profile as needed, and 
below we assume that the distribution of galaxies inside haloes has 
been specified and that the corresponding tracer particles have been 
selected for each halo. 

We divide haloes in the simulation into N narrow mass bins 
and denote the mean number density of haloes in the mass bin 
log Mi ± dlogMi/2 as hi. The mean number density of galax¬ 
ies is computed as 

fig = ni[{Ncen{Mi)) + (Wsat(Mi))], (1) 

i 

where Nceii{M) and Asat(M) are the occupation numbers of cen¬ 
tral and satellite galaxies in a halo of mass M, () denotes the aver¬ 
age over all haloes of this mass, and i = 1, ..., N. 

In the halo-based model, galaxy 2PCF ^gg is computed as 
the combination of two terms, ^gg = 1 + Cgg + eS (Zheng 
2004a), where the one-halo term ^gg (two-halo term ^gg) are from 
contributions of intra-halo (inter-halo) galaxy pairs. Following 
Berlind & Weinberg (2002), the one-halo term can be computed 
based on 

ing(ngd^r) [l-l-?gg(r)] = ni(Wpair(Mi))/(r; Mi)rf^r.(2) 

i 

The left-hand side (LHS) is the number density of one-halo pairs 
with separation in the range r ± dr/2 from the definition of 2PCF. 
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The right-hand side (RHS) is the same quantity from counting one- 
halo pairs in each halo and the summation is over all the halo mass 
bins. Here (A^pair(Af)) is the total mean number of galaxy pairs in 
haloes of mass M, and /(r; M) is the probability distribution of 
pair separation in haloes of mass M, i.e. /(r; M)dPv is the proba¬ 
bility of finding pairs with separation in the range r±dr/2 in haloes 
of M. By further decomposing pairs into central-satellite (cen-sat) 
and satellite-satellite (sat-sat) pairs, we reach the following expres¬ 
sion, 

1 + ^gg(r) = y'2^(7Vce„(Mi)iVsat(MO>/cs(r;Mi) 

% 

E n' 

- l])/ss(r; Mi). (3) 

. ^9 

% 

The functions /cs(r; M) and /ss(r; M) are the probability distri¬ 
butions of one-halo cen-sat and sat-sat galaxy pair separation in 
haloes of mass M. They are normalized such that 

J fca{r-,M)(fr = l and J /ss(r; = 1. (4) 

Note that here and in what follows, the 2PCF can be either real- 
space, projected-space, redshift-space, or it can be the multipoles 
of the redshift-space 2PCF. The variable r should be understood as 
pair separation in the corresponding space. For redshift-space clus¬ 
tering, we discuss how to specify velocity distribution of galaxies 
later. 

To compute the two-halo term, we add up all possible two-halo 
galaxy pairs, following the 2PCF decomposition from different pair 
counts in Zu et al. (2008). Similar to equation (2), the total number 
density of two-halo pairs with separation in the range r ± dr/2 is 

npair,2h = ihg(hgd®r) [l + C|g(r)] , (5) 

which is composed of two-halo central-central (cen-cen) pairs 

ttcc —pair,2h = “ [tli { A(:;en (OTt) )] [tZj (A^cen (Od/ ) ) d f] 

x[l-l-Chh,cc(r;Mi,Mg)], (6) 

two-halo cen-sat pairs 

ncs-pair,2h = {Ncen{Mi))][nj (Nsi,t (Mj))(f v] 

i¥=3 

x[l-behh.cs(r;Mi,Mg)], (7) 

and two-halo sat-sat pairs 

nss-pair,2h = ^ ^ [fli {A^sat (Mi ))] [% {A^sat (Mj ) ) d®r] 

x[l-f ^hh, as(r; Mi, M,)]. (8) 

In each of equations (6)-(8), the summation is over all halo mass 
bins (i.e. i = 1,..., A'^ and j = 1,..., N). The three correlation func¬ 
tions on the RHS have the following meanings - ^hh.cc (r; Mi, Mj ) 
is just the two-point cross-correlation function between ‘centres’ 
(positions to put central galaxies) of haloes of masses Mi and 
Mj (cen-cen); ^hh,cs(r; Mi, Mj) is the two-point cross-correlation 
function between the ‘centres’ of Mi haloes and the satellite tracer 
particles in the (extended) Mj haloes (cen-sat); 5hh,aa(r; Mi, Mj) 
is the two-point cross-correlation function between satellite tracer 
particles in the (extended) Mi haloes and those in the (extended) 
Mj haloes (sat-sat). With npa,ir, 2 h = ncc-pa,ir. 2 h + ncs-pair, 2 h + 


nss-pair, 2 h, we reach the final expression for the two-halo term, 

Cgg(r) = V ^(Arcen(M)){Arcen(M^))ehh.cc(r;Mi,Mg) 

. , . '^9 

+ V 2^(iVcen(Mi)){Araat(Mg))^hh,cs(r; Mi, Mg) 

, , . '^9 

+ V ^(iVsat(Mi))(Ar,at(Mg))Chh,ss(r; Mi, Mj). (9) 

Equations (1), (3), and (9) lead to the method we pro¬ 
pose. The quantities related to galaxy occupancy are specified 
by the HOD/CLF parameterization one chooses, while those re¬ 
lated to haloes are from the simulation, independent of the 
HOD/CLF parameterization. We therefore can prepare tables for 
fii, /cs(r; Mi), /ss(r; Mi), ^hh,cc(r; Mi, Mj), ^hh,cs(r; Mi,Mj), 
and ^hh,ss(r; Mi, Mj). For a given set of HOD/CLF parameters, 
the predictions of galaxy 2PCFs can be obtained from perform¬ 
ing the weighted summation over the tables. The tables are only 
prepared once, and we can then change the galaxy occupation as 
needed to compute galaxy 2PCFs for different galaxy samples and 
different sets of HOD/CLF parameters, which is much more effi¬ 
cient than populating galaxies into haloes by selecting particles. 

Since summation is used to replace integration in the method, 
we need to choose narrow halo mass bins (dlog M = 0.01 is usu¬ 
ally sufficient, as shown in Section 3). The fii table represents the 
halo mass function. To prepare the other tables that depend on 
pair separation, the bins of pair separation r are best chosen to 
match the ones used in the measurements from observational data, 
which would naturally avoid any discrepancy related to the finite 
bin sizes. For haloes in each mass bin, the /cs and fss tables can 
be computed by using either all the particles in the haloes with 
the specified distribution or a random subset. For ^hh,cc, Chh.cs, 
and ^hh.ss, we effectively compute the halo-halo two-point cross¬ 
correlation function with different definitions of halo positions. For 
^hh.cc halo positions are defined by our choice of ‘centres’. For 
^hh,cs(r; Mi, Mj), we choose ‘centres’ for Mi haloes and positions 
of arbitrary tracer particles in Mj haloes. For ^hh.,aa(x-, Mi, Mj), 
positions of arbitrary tracer particles in both Mi and Mj haloes are 
chosen. We can use any number of tracer particles in each halo to 
do the calculation. For haloes with positions defined by the tracer 
particles, they can be thought as extended (with positions having a 
probability distribution). On large scales, ^hh,cc, Chh.cs, and ^hh,ss 
are the same, while on small scales, ^hh,cs and ^hh,ss are smoothed 
version of ^hh.cc- Note that in analytic models such differences 
are usually neglected. In computing the three halo-halo correlation 
functions, we do not need to construct random catalogs to find out 
the pair counts from a uniform distribution - in the volume Kim 
of the simulation with periodic boundary conditions, the counts of 
cross-pairs at separation in the range r ± dv/2 between two ran¬ 
domly distributed populations with number densities Ui and Uj are 
simply (hiKim)(?/jd^r). Making use of this fact can greatly re¬ 
duce the computational expense in preparing the tables. 

For the redshift-space tables, in addition to the halo veloci¬ 
ties, one needs to specify the velocity distribution of galaxies in¬ 
side haloes, which can be different from that of dark matter parti¬ 
cles (a.k.a. velocity bias; e.g. Berlind & Weinberg 2002). The dif¬ 
ference can be parameterized by central and satellite velocity bias 
parameters (e.g. Guo et al. 2015a). For a set of central and satel¬ 
lite velocity bias parameters and with a choice of the line-of-sight 
direction, we can obtain the redshift-space positions of the cen¬ 
tral galaxy and satellite tracer particles according to halo velocities 
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and central and satellite galaxy velocity distributions inside haloes, 
and the redshift-space tables can he computed. We suggest to pre¬ 
pare tables for different sets of central and satellite velocity bias 
parameters and interpolate among tables to probe the velocity bias 
parameter space, as is done in Guo et al. (2015a). 

Multipole moments of redshift-space galaxy 2PCFs are usu¬ 
ally modelled. We can derive the corresponding tables by comput¬ 
ing the corresponding multipole moments of /cs, /ss, Chh.cc, Chh,cs, 
and ^hh,ss- In such a case, r is expressed by s = |r| and fi, the co¬ 
sine of the angle between r and the line-of-sight direction. In the 
integration (summation) for obtaining the multipoles, the bins of 
jj, match those used in observational measurements to remove any 
finite-bin-size effect. 

For modelling the projected 2PCF Wp, a corresponding set of 
tables can be obtained by integrating the redshift-space tables over 
the line-of-sight separation. The integration is done in the same way 
as in the measurements with data to avoid any finite-bin-size effect, 
summing over the same line-of-sight bins (with the same bin size) 
up to the same maximum line-of-sight separation. 

2.2 Case with Subhaloes 

The SHAM method uses more information from (high-resolution) 
simulations, including both distinct haloes and subhaloes identi¬ 
fied inside distinct haloes, where the distinct haloes refer to haloes 
that are not subhaloes of another halo. Distinct haloes are also re¬ 
ferred to as haloes, main haloes, or host haloes. Central galaxies 
are hosted by distinct haloes at the centres, while satellite galaxies 
are in subhaloes. Before merging into distinct haloes, subhaloes are 
distinct haloes themselves. The SHAM method generally works in 
the following way. By adopting one property, subhaloes and dis¬ 
tinct haloes can be treated as a unified entity. For distinct haloes, 
the property is evaluated at the time of interest. For subhaloes, it be¬ 
comes common practice to evaluate the property at the time when 
subhaloes were still distinct haloes. The properties commonly used 
include mass (Mace) at the time a subhalo was accreted into a host 
halo, maximum circular velocity I4cc at the time of accretion, and 
peak maximum circular velocity Vj,eak over the history of the sub¬ 
halo as a distinct halo. The connection between haloes/subhaloes 
and galaxies is established by rank ordering haloes/subhaloes ac¬ 
cording to the given property and galaxies according to one cer¬ 
tain property (e.g. luminosity or stellar mass). When normalized 
to the same survey/simulation volume, halo/subhalo and galaxy 
of the same rank are linked. A more general treatment also ac¬ 
counts for the scatter between the halo/subhalo property and the 
galaxy property. The simple procedure of linking light (galaxies) to 
matter (haloes/subhaloes) can provide a reasonable interpretation 
of galaxy clustering trend and enable a study of galaxy evolution 
(e.g. Conroy et al. 2006; Conroy & Wechsler 2009; Behroozi et al. 
2013; Reddick etal. 2013). 

We generalize the idea in Section 2.1 to the subhalo case, ex¬ 
tending the SHAM model and making it efficient to model galaxy 
clustering. The model allows the scatter between the halo/subhalo 
property and the galaxy property to be different for distinct haloes 
(central galaxies) and subhaloes (satellite galaxies). We use mass as 
the halo/subhalo property variable here, which can be understood 
as the mass at accretion (Mace). However, it can be replaced by any 
property one chooses to adopt, e.g. Vacc and Vpeak- A halo/subhalo 
method following a similar spirit of pair decomposition to model 
the projected galaxy 2PCF and weak lensing signal is presented in 
Neistein et al. (2011) and Neistein & Khochfar (2012). 

Compared to the commonly used SHAM method that con¬ 


nects the whole range of galaxy property and halo/subhalo property, 
the method presented here works for each individual galaxy sam¬ 
ple. To some degree, it is formulated in an HOD/CLF-like form, 
with distinct haloes and subhaloes as tracers of central and satellite 
galaxies, respectively. It is no longer limited to abundance match¬ 
ing. Instead, the method can be used to fit both galaxy abundance 
and galaxy clustering (2PCFs). 

For a given galaxy sample, the scatter between halo/subhalo 
property and galaxy property means that not all haloes/subhaloes 
are fully occupied by these galaxies, which can be characterized by 
the probability of occupancy (or the smaller-than-unity mean oc¬ 
cupation number). Denote the mean occupation number of central 
galaxies in distinct haloes of mass Mh as pceii{Mh) and that of 
satellite galaxies in subhaloes of mass Ms as Psat (Ms). The same 
bins of mass are adopted for Mh and Ms. In principle we do not 
need to differentiate Ms and Mh, since the scripts of ‘c’ (cen) and 
‘s’ (sat) below make the situation self-explanatory. Let the mean 
number densities of distinct haloes and subhaloes in the mass bin 
log Mi ± dlog Mi/2 be rih,i and ns,i, respectively. 

For a given sample of galaxies, with a model of pcen{M) and 
Psat(M), the mean number density of galaxies rig is computed as 

ng = ^lnh ,ipcen (Mi)-h ns.iPsat(Mi)]. (10) 

i 

With a similar decomposition as in equation (9), the galaxy 2PCF 
can be computed as 

Cgg(f) = 'y ^ -2 ’'^Pcen(Mi)pcen(Mj)^hh(r; Mi, Mj) 

+ V 2^^;il4^p^^„(Mi)psat(M,)ehs(r; Mi, Mg) 

+ V ^^^^Psat(Mi)p,at(Mg)esB(r;Mi,Mg), (11) 

which simply states that the total number of galaxy pairs is the sum 
of cen-cen, cen-sat, and sat-sat pairs. The three correlation func¬ 
tions on the RHS have the following meanings - ^hh (r; Mi, Mj) is 
just the two-point cross-correlation function between centres of dis¬ 
tinct haloes of masses Mi and Mj ; ^hs (r; Mi, Mj ) is the two-point 
cross-correlation function between centres of Mi distinct haloes 
and those of Mj subhaloes; ^ss(r; Mi, Mj) is the two-point cross- 
correlation function between centres of subhaloes of masses Mi 
and Mj. Unlike the particle case in Section 2.1 , there are no explicit 
one-halo and two-halo terms here (though they can be derived), and 
the i j condition is not imposed in the summation. 

The quantities Pcen(M) and Psat(M) come from the occu¬ 
pation function model, which is up to our choice of parameteriza¬ 
tion for the sample of galaxies. In this halo/subhalo-based method, 
we only need to prepare tables for riii,i, rig^i, ^hh(i’; Mi, Mj), 
^hs(r; Mi, Mj), and /ss(r; Mi, Mj). 

As with the tables using particles (Section 2.1), for redshift- 
space 2PCF or multipole moments, tables for different sets of cen¬ 
tral and satellite velocity bias parameters can be prepared. For each 
set, haloes and subhaloes are shifted to redshift-space positions for 
calculation. Tables can also be generated for modelling the pro¬ 
jected 2PCF Wp. The procedures and bins used in the measurements 
should be followed so that the model and measurements are made 
fully consistent. 
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Figure 1. Decomposition of the projected galaxy 2PCF Wp and redshift-space 2PCF multipoles §0. ? 2 . and §4 into the various one-halo and two-halo 
components (one-halo cen-sat, one-halo sat-sat, two-halo cen-cen, two-halo cen-sat, and two-halo sat-sat). The circles are measurements from 100 mock 
galaxy catalogs constructed by populating galaxies into dark matter halos in the simulation, according to the set of fiducial FIOD parameters. The curves ar e 
calculations with the method introduced in this paper. See text for more details. 


3 AN EXAMPLE APPLICATION AND THE 
REDSHIET-SPACE 2PCE DECOMPOSITION 

The method developed here has heen successfully applied to model 
projected and redshift-space 2PCFs of SDSS and SDSS-III galax¬ 
ies on small to intermediate scales (e.g. Guo et al. 2015a,b,c) and 
to compare HOD and SHAM models (Guo et al. in prep.). As the 
method is built on the basis of decomposition of galaxy 2PCFs, 
here we provide an example to illustrate the different 2PCF com¬ 
ponents. In particular, we show the components for the redshift- 
space 3D 2PCF and the manifestation of redshift-space distortions 
in each component to have a better understanding of the redshift- 
space 2PCFs within the HOD framework. In addition, we also in¬ 
vestigate how redshift-space 2PCFs help with HOD constraints, 
including the inference of the galaxy velocity distribution inside 
haloes. 


The example adopts HOD parameters for the sample of 
2 ; ~ 0.5 CMASS galaxies in the the SDSS-III Baryon Oscillation 
Spectroscopic Survey (BOSS; Dawson et al. 2013). With spherical 
overdensity haloes and halo particles from the 2 : = 0.53 output of 
the MultiDark simulation (MDRl; Pradaetal. 2012; Riebe et al. 
2013), we create tables for halo properties, including halo number 
density n (i.e. halo mass function), projected 2PCF Wp, redshift- 
space 2PCF monopole ^ 0 , quadrupole ^ 2 , and hexadecapole ^ 4 . 
We choose the position of the potential minimum as the centre of 
each halo for putting the central galaxy and halo particles as trac¬ 
ers of satellites. Each of Wp and ^ 0 / 2/4 has five components (one- 
halo cen-sat, one-halo sat-sat, two-halo cen-cen, two-halo cen-sat, 
and two-halo sat-sat). To generate the Wp{rp) tables, we measure 
for each component and for each combination of halo 
mass bins and sum over the direction, where rp and r-^ are the 
pair separations in the directions perpendicular and parallel to the 
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Figure 2. Decomposition of the 3D redshift-space 2PCF ^(rpyr-jr) into the various one-halo and two-halo components (one-halo cen-sat, one-halo sat-sat, 
two-halo cen-cen, two-halo cen-sat, and two-halo sat-sat). The plot is based on the average measurements from 100 mock galaxy catalogs constructed by 
populating galaxies into dark matter halos in the simulation, according to the set of fiducial HOD parameters. The color scale shows ^{rp,r.,y) in logarithmic 
scale. See text for more details. 


line-of-sight direction (chosen to be one principle direction of the 
simulation box). To generate the ^ 0 / 2/4 tables, we measure ^{s, /r) 
for each component and for each combination of halo mass bins and 
form the multipoles by integrating over /r, where s is the redshift- 
space pair separation and jj, the cosine of the angle between pair 
displacement and the line-of-sight direction. Following the setup 
in the observational measurements (Guo et al. 2015a), we have 19 
bins for Vp and s uniformly spaced in logarithmic space, 50 linearly 
spaced bins in Tt, and 20 linearly spaced bins in /i. For halo mass 
bins, we use dlog M — 0.01. We construct tables for 5 bins of cen¬ 
tral velocity bias parameter ac and 8 bins of satellite velocity bias 
parameter as , respectively. The total size of the final set of tables 
is about 10GB. That is, the information in the high-resolution sim¬ 
ulation output relevant for modelling projected and redshift-space 
2PCFs of galaxies has been tremendously compressed, making the 
modelling tractable even with a desktop computer. 

For the HOD, we adopt the common parameterization for a 
sample of galaxies above a luminosity threshold (Zheng et al. 2005, 
2007). The mean occupation function of central galaxies in haloes 
of mass M is 


(ATeenjAT)) 


1 + erf 


log M — log Mir 

^log M 


( 12 ) 


where erf is the error function. For the mean occupation function 
of satellite galaxies, we use 


{Ns^t{M)) 


(ATcenjM)) 


M -Mo 

W[ 


(13) 


The number of satellites in haloes of mass M is assumed to follow 
the Poisson distribution with the above mean. In addition, for mod¬ 
elling redshift-space 2PCFs, we have two additional HOD param¬ 
eters Qc and as for central and satellite velocity bias. Essentially, 
Qc (ofs) is the ratio of the velocity dispersion of central (satellite) 
galaxies to that of dark matter particles inside halos (see Guo et al. 


2015a). For the fiducial model, we adopt the set of parameters that 
fit the projected and redshift-space 2PCFs for the CMASS sam¬ 
ple in Guo et al. (2015a) - logMmin = 13.36, criogM = 0.64, 
log Mo = 13.20, logMj = 14.23, a = 1.05, Oc = 0.30, and 
as = 0.91. Halo masses are in units of h~^M q. 

With the tables and the fiducial HOD parameters, we follow 
equations (1), (3), and (9) to compute all the components of Wp 
and 5 o/ 2 / 4 - For the purpose of a sanity check, we also measure the 
components from 100 mock galaxy catalogs. The mock catalogs 
are generated from populating haloes in the simulation by putting 
central galaxies at the potential minimum in haloes and drawing 
random dark matter particles as satellite galaxies, in accordance 
with the occupation distributions and velocities set by the fiducial 
HOD parameters. For the purpose of comparison with the model 
based on the tables, we decompose the galaxy 2PCF (either Wp or 
Co/2/4) measured in the mock catalogs into five components. 


Cgg(f) 


2?T-cs —pair ^ ^ ^ <-j^ss —pair 


-/cs(r) + 2- 


fss{r) 


— 2 — — — 2 

TT'c j- / \ t-\ Thc^s j. / \ TlB j. / \ 

2 ^2 Ccs(r) + ^Css(r)- 


(14) 


The first two terms on the RHS are one-halo terms - hcs-pair and 
h-ss-pair are the mean number densities of one-halo cen-sat pairs 
and one-halo sat-sat pairs measured in the mock catalogs, and /cs 
and /ss are the normalized average distributions of one-halo cen- 
sat and sat-sat pairs in the mock. The last three terms on the RHS 
are two-halo terms - he and hs are the mean number densities of 
central and satellite galaxies in the mock, and ^cc, Ccs, Css are the 
2PCFs by counting only two-halo cen-cen, cen-sat, and sat-sat pairs 
(Zu et al. 2008). 

Figure 1 shows the decomposition of Wp and Co/2/4 for tho 
fiducial model. As expected, the calculations from the simulation- 
based method (curves) agree with the measurements from the mock 
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catalogs (circles), which is reassuring. For the projected 2PCF 
(top-left panel), the one-halo cen-sat term (red) dominate the small- 
scale signal. The one-halo sat-sat term (magenta) extends to larger 
scales, since the maximum sat-sat pair separation in a halo is the di¬ 
ameter of the halo, twice that of the cen-sat pair separation. Owing 
to the low satellite fraction (/sat ~ 7%) of this sample of galaxies, 
the contribution of the one-halo sat-sat pairs to Wp is overall small, 
but noticeable around l/i“^Mpc, the one-halo to two-halo term 
transition scales. On large scales, the three two-halo terms have a 
similar shape, since they essentially follow the halo-halo correla¬ 
tion. The flattening towards small scales are caused by the halo ex¬ 
clusion effect. Compared to the two-halo cen-cen component, the 
two-halo cen-sat is smoothed on small scales, since each halo con¬ 
tributing the satellite of the cen-sat pair on average is extended in¬ 
stead of a point source (the case for the halo contributing the central 
galaxy of the pair) as a result of the spatial distribution of satellites 
inside haloes. The two-halo sat-sat term is even more smoothed, 
since every halo becomes extended. To see the relative contribution 
of each term to the large-scale 2PCF, we note that in equation (14), 
^cc oc be. Ccs oc bebs, and ^ss oc b^ on large scales, where be and 
bs are the large-scale bias factors for central and satellite galax¬ 
ies, respectively. Since satellites on average reside in more massive 
haloes than central galaxies, the value of bs is higher than that of 
be (roughly by tens of per cent for luminosity-threshold samples). 
From equation (14), we see that the relative contributions to the 
large-scale 2PCF from the two-halo cen-cen, cen-sat, and sat-sat 
terms are 1 : 2fnfb ■ {fnfbT, with /„ = fis/fie = /sat/(l - /sat) 
the satellite to central galaxy number density ratio and fb = bs/bc 
the satellite to central galaxy bias ratio. For the sample we con¬ 
sider, the ratios are 1; 25% : 1.6%. For lower luminosity samples 
with higher satellite fractions, we expect the contributions from the 
two-halo cen-sat and sat-sat to be substantially higher. 

The decomposition of the redshift-space 2PCF monopole 
(top-right panel) and the relative amplitudes of the various terms 
are similar to the case of Wp. The bottom two panels show the 
case of quadrupole ^2 and hexadecapole ^ 4 , and a factor ® is 
multiplied for each term so that both the small-scale and large- 
scale signals can reasonably show up. The Fingers-of-God effect 
(Jackson 1972; Huchra 1988) from one-halo terms causes a posi¬ 
tive quadrupole. In the ^2 panel, we see that the influence of the 
one-halo terms can extend to about 10/i“^Mpc in the quadrupole. 
The negative quadrupole on large scales manifests the Kaiser effect 
(Kaiser 1987; Hamilton 1992) caused by the coherent motion of 
haloes, falling into overdense regions and streaming out of under- 
dense regions. The two-halo cen-cen term dominates the large-scale 
quadrupole, but the cen-sat term is also important. Both terms show 
low positive quadrupole signals toward small scales caused by the 
random motion of haloes (and galaxies). The two-halo sat-sat term 
makes an almost negligible contribution to the quadrupole on all 
scales. The hexadecapole ^4 (bottom-right panel) are mostly posi¬ 
tive from all components. The relative contributions from different 
components are similar to the quadrupole case. 

The projected 2PCF and the redshift-space 2PCF multipoles 
are usually the quantities to model. The 3D redshift-space 2PCF 
measurements are commonly displayed as contours of ^(rp,r..r), 
which make the redshift-space distortion effects on all scales eas¬ 
ily visualized. It would be instructive to have the corresponding 
one-halo and two-halo components to gain a better intuition about 
the redshift-space distortions. Figure 2 shows such a decomposition 
measured from the mock catalogs, which can also be calculated us¬ 
ing the ^{vp, r-jt) component tables. 

The leftmost panel shows the total redshift-space 2PCF of the 


sample, with the Fingers-of-God and Kaiser effects clearly seen. 
The Fingers-of-God effect, limited to small transverse separation 
Tp, is mainly contributed by the one-halo terms (two middle panels 
on the top). The one-halo sat-sat component appears to be more ex¬ 
tended than the one-halo cen-sat component in both the transverse 
and the line-of-sight direction. In the transverse direction, it can be 
explained by the fact that the largest one-halo sat-sat (cen-sat) pair 
separation is about the diameter (radius) of the largest haloes. In the 
line-of-sight direction, the elongation is mainly a result of galaxy 
motion inside haloes. The relative line-of-sight velocity of sat-sat 
pairs are higher than that of cen-sat pairs, causing the one-halo sat- 
sat component to be more extended (shallower profile as a function 
of r^). The total one-halo term (rightmost panel on the top) is dom¬ 
inated by the cen-sat and sat-sat component at small Vp and slightly 
large r^, respectively. 

The three two-halo components and the total two-halo term 
are shown in the bottom panels of Figure 2. In each component, the 
double-hump feature at small Vp reflects the halo-exclusion effect. 
The effect would lead to a hole at the centre if the real-space 2PCF 
were plotted here. The shift in the line-of-sight galaxy positions in 
redshift space from galaxy peculiar motion makes the hole partially 
filled. The two-halo cen-cen component shows an overall Kaiser 
squashing effect along the line of sight. However, the contours at 
small rp are elongated along the line of sight, like the Fingers-of- 
God effect. This is caused by the random motion of haloes and 
that of central galaxies with respect to haloes (i.e. a non-zero cen¬ 
tral velocity bias). The two-halo cen-sat component shows a much 
stronger line-of-sight elongation up to a few Mpe in Vp. The rea¬ 
son lies in the motion of satellites inside haloes, which causes the 
average redshift-space distribution of satellites appears extended 
along the line of sight in an average halo hosting the satellites of 
the two-halo cen-sat pairs. The line-of-sight elongation pattern is 
even stronger in the two-halo sat-sat component - the correlation 
of elongated haloes (as a result of the redshift-space spatial distri¬ 
bution of satellites inside haloes) completely suppresses the Kaiser 
effect even on the largest scales shown here (~ 20/i“^Mpc). The 
total two-halo term is dominated by the cen-cen component with 
a substantial contribution from the cen-sat component. The sat-sat 
component does not make an important contribution for this sam¬ 
ple. As discussed before, we expect the two-halo cen-sat and sat- 
sat components to become more important for galaxy samples with 
lower luminosity thresholds and higher satellite fractions. 

Overall, for the 3D redshift 2PCF ^{rp,r-n) different compo¬ 
nents of the one-halo and two-halo terms have different transverse 
range of the line-of-sight elongation. The profile along the line of 
sight also depends on the type of pairs in consideration, becom¬ 
ing increasingly shallower from cen-cen, cen-sat, to sat-sat com¬ 
ponents. For each component, the streaming model (e.g. Peebles 
1980) usually adopted in simple models of redshift-space distor¬ 
tions should work well, which is kind of a convolution of the real- 
space 2PCF with a velocity dispersion kernel. For the total redshift- 
space 2PCF, our results indicate that it is hard to use a single veloc¬ 
ity dispersion kernel to accurately model the redshift-space distor¬ 
tion effect. The different components are needed if one wishes to 
develop an accurate analytic model (e.g. Tinker 2007). 

Finally, we investigate the constraints on the HOD parame¬ 
ters from projected and redshift-space 2PCFs. The 2PCFs predicted 
from the fiducial set of HOD parameters are used as the input mea¬ 
surements, and the full covariance matrix from Guo et al. (2015a) 
measured from the CMASS data is adopted. The model uncertainty 
caused by the finite volume of the simulation is also accounted 
for by rescaling the covariance matrix (see Appendix A). We em- 
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Figure 3. Left: Constraints on log M^nin and criog m from the 2PCFs with the fiducial galaxy sample. The model 2PCFs are calculated with method introduced 
in this paper. Blue and black contours are for the cases of modelling Wp only and jointly modelling u'p+Co/ 2 / 4 . respectively. The 68.3% and 95.4% confidence 
levels are shown for each case. Right: Constraints on the central and satellite velocity bias parameters (a^ and as) for the fiducial galaxy sample from jointly 
modelling Wp + 5o/2/4- The red asterisk in each panel indicates the value from the fiducial model. 


ploy a Monte Carlo Markov Chain method to explore the param¬ 
eter space of the 7 HOD parameters, Mmin, criogM, Mq, M[, a, 
Qc, and as- We first model the projected 2PCF Wp only. The first 
five parameters related to the galaxy mean occupation function can 
be constrained, while there are virtually no constraints on the ve¬ 
locity bias parameters (ac and a^) as the line-of-sight informa¬ 
tion is lost. We then jointly model Wp and the redshift-space 2PCF 
multipoles ^o/ 2 / 4 - We find that redshift-space 2PCFs help tighten 
the constraints mainly in Mmin and aiogM, the two parameters 
for the mean occupation function of central galaxies. In the left 
panel of Figure 3, we compare the constraints (marginalized la 
and 2a contours) from Wp only (blue) and uip+^ 0 / 2/4 (black). The 
constraints on the parameters for the mean occupation function of 
satellite galaxies are only slightly improved, mainly in Mq. In gen¬ 
eral, compared to the tUp-only case, redshift-space 2PCFs do not 
lead to a substantial improvement in the HOD parameters related 
to the occupation function. The reason may be related to the fact 
that the projected 2PCF Wp is not independent of the redshift-space 
2PCFs, and that the information content in ^ 0 / 2/4 to constrain the 
occupation-related parameters is largely overlapped with that in 
Wp. The correlated information in Wp and ^ 0 / 2/4 is embedded in 
the covariance matrix. Therefore, when jointly modelling Wp and 
^ 0 / 2 / 4 , it is important to use the full covariance matrix including 
the covariances between Wp and ^ 0 / 2/4 to avoid double counting 
the information content and artificially tightening the HOD con¬ 
straints. 

The redshift-space distortions are caused by the peculiar mo¬ 
tion of galaxies. The peculiar motion of haloes is in the simulation 
and built in the tables. So modelling redshift-space 2PCFs lead to 
constraints of galaxy motion inside haloes, i.e. the central and satel¬ 
lite velocity bias parameters. The right panel of Figure 3 shows 
that velocity bias parameters can be clearly detected for the fidu¬ 
cial sample. Velocity bias parameters have been constrained from 
redshift-space clustering for the 2 ; ~ 0.5 BOSS CMASS galax¬ 
ies (Guo et al. 2015a,b; Reid et al. 2014) and 2 ~ 0.1 SDSS Main 
galaxies (see Guo et al. 2015c and Guo et al. 2015d for applying 
the modelling method based on simulation particles and subhaloes. 


respectively). More discussions on the velocity bias constraints and 
the implications can be found in Guo et al. (2015a). 


4 SUMMARY AND DISCUSSION 

In this paper, we introduce a simulation-based method to accu¬ 
rately and efficiently model galaxy 2PCFs in projected and redshift 
spaces. The basic idea is to make use of a high-resolution simu¬ 
lation and tabulate all the halo information necessary for galaxy 
clustering calculation. Then on top of the tables, galaxy 2PCFs can 
be computed with the galaxy-halo relation specified by the HOD or 
CLF model. We also provide a version that applies to and extends 
the SHAM method. Based on the method, we also study the de¬ 
composition of the projected and redshift-space galaxy 2PCFs into 
different components according to the type of galaxy pairs. 

The proposed method is accurate, since it is directly based on 
high-resolution simulations. The effects like halo exclusion, non¬ 
linear evolution, scale-dependent halo bias, and non-sphericity of 
haloes, which are difficult to deal with in analytic methods of com¬ 
puting galaxy 2PCFs, are all automatically accounted for in the 
simulation-based method. The method also breaks the 2PCFs into 
all the one-halo and two-halo components based on the nature of 
galaxy pairs and computes each component accurately, which are 
usually not the case in analytic methods (especially for the two- 
halo term). When building the tables, the same binning scheme 
(in pair separation and in angle) and the same integration proce¬ 
dure as used in the observation measurements are adopted, so there 
is no binning-related issue when comparing the model prediction 
with the measurements. The method is equivalent to measure the 
model galaxy 2PCFs from mock catalogs and is as accurate as what 
the mean mock catalog can achieve. The mock catalogs are con¬ 
structed by populating galaxies (using tracer particles) to haloes 
identified in the simulation, according to the halo occupation spec¬ 
ified by the HOD/CLF model. However, the method is more effi¬ 
cient, as it avoids the construction of mock catalogs and the mea¬ 
surement of the 2PCFs from the mocks. Instead, ‘populating galax¬ 
ies’ and ‘measuring the 2PCFs’ are performed analytically within 
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the HOD/CLF framework. This greatly reduces the computational 
time and make it possible to efficiently explore the parameter space 
when modelling the 2PCF data. 

A similar method working in Fourier space can be easily 
developed to model galaxy redshift-space power spectrum. The 
method can also be generalized to other clustering statistics, e.g. 
angular 2PCF of galaxies, two-point cross-correlation function of 
galaxies, and galaxy-galaxy lensing. Generalizing the method to 
three-point correlation function (3PCF) of galaxies is also possi¬ 
ble. In principle, there are more components for the 3PCF - cen- 
sat-sat and sat-sat-sat triplets for the one-halo term, cen-(cen-sat), 
cen-(sat-sat), sat-(cen-sat), and sat-(sat-sat) triplets for the two-halo 
term (the pair in the parentheses is in the same halo), and cen-cen- 
cen, cen-cen-sat, cen-sat-sat, and sat-sat-sat triplets for the three- 
halo term. More importantly, compared to the 2PCF case, the di¬ 
mension of each 3PCF component table will increase (e.g. two 
sides and the angle in between for a triangle configuration and 
three halo mass indices). To make such a method suitable for the 
3PCF modelling, further simplification is necessary, e.g. through 
multipole or Fourier expansion (e.g. Szapudi 2004; Zheng 2004b; 
Slepian & Eisenstein 2015). 

To make use of the high precision of small- to intermediate- 
scale 2PCFs measurements to help constrain cosmological param¬ 
eters (e.g. Zheng & Weinberg 2007; Reid et al. 2014), a set of ta¬ 
bles need to be prepared based on simulations with different cos¬ 
mological parameters or by rescaling one simulation to different 
cosmological models (e.g. Zheng et al. 2002; Tinker et al. 2006; 
Angulo & White 2010; Reidetal. 2014; Guoetal. 2015c). Even 
with one cosmological model, there may be situations that need 
more tables. Eor example, in the particle-based model, random par¬ 
ticles are selected to trace satellite galaxies by default. However, the 
difference between the spatial distributions of satellites and dark 
matter can be an additional parameter to be constrained. Eor such 
a purpose, one needs to build different sets of tables using tracer 
particles of different distributions. In either of the above cases (or 
any case that needs to extend the tables), the total size of the tables 
would have an order-of-magnitude increase. Compared with meth¬ 
ods of directly populating simulations, such an increase in table 
size is still reasonable and manageable. 

With one simulation, we do not have the global or ensemble 
average properties of haloes. That is, the model with one simula¬ 
tion has uncertainty caused by the finite volume effect. One can 
use multiple simulations with different realizations of the initial 
conditions to build the average tables, which reduces the model un¬ 
certainty. The model uncertainty should be included in modelling 
data. In Appendix A, we show that this can be done by rescaling 
the covariance matrix of the measurements based on the ratio of 
simulation and survey volume. Eor any simulation, the fluctuation 
modes with wavelengths longer than the box size are missing, so 
the application of our modelling method should be limited to scales 
much smaller than the simulation box size. This is particularly true 
for redshift-space distortion modelling, since the velocity field is 
more sensitive to large-scale modes than the density field. 

In presenting the method, the halo variable is adopted to be 
halo mass (or characteristic velocity for the subhalo case) to build 
the tables. The corresponding HOD/CLE model assumes that the 
statistical properties of galaxies inside haloes only depend on halo 
mass, not on halo environment or growth history. Clustering of 
haloes at fixed mass is found to depend on the assembly history 
(a.k.a. assembly bias; e.g. Gao et al. 2005; Wechsler et al. 2006; 
Zhu et al. 2006; ling et al. 2007). There is room for the galaxy con¬ 
tent in haloes of fixed mass to depend on halo formation history, 


which would affect galaxy clustering and HOD constraints (e.g. 
Zentner et al. 2014), although no clear evidence is found in hy¬ 
drodynamic galaxy formation simulations (e.g. Berlind et al. 2003) 
or galaxy clustering measurements (e.g. Lin et al. 2015). As men¬ 
tioned in Section 2, the halo variable in our method is not nec¬ 
essarily the halo mass. It can certainly be a set of variables, like 
halo mass plus a variable characterizing halo formation history (e.g. 
halo concentration or formation redshift). With tables built in terms 
of the set of variables, along with an HOD/CLE model depend¬ 
ing on these variables, the simulation-based method works in the 
same way as presented in this paper. However, the efficiency of 
the method drops sharply when including more halo variables. The 
limitation is mainly set by the computation of the two-halo terms, 
where both the table size and computational time scale as 0{N^), 
with N the total number of bins in halo properties (e.g. with A^i 
halo mass bins and N 2 halo formation time bins, N — NiN 2 }. 
In practice, we may be barely able to accomodate the case of two 
halo variables, by choosing bin sizes to minmize the table size and 
computational cost without sacrificing the accuracy of the method. 
Before resorting to directly populating the simulations, a possible 
way of circumventing the limitation is to use some combination of 
halo variables, reducing the problem to one effective halo variable. 
Certainly further investigations are needed to find the appropriate 
combination(s). 

A different approach to model galaxy clustering is through an 
emulator (e.g. Kwan et al. 2015). With this approach, galaxy corre¬ 
lation functions are first obtained with mock catalogs from A^-body 
simulations, spanning a range of HOD parameters. Then the emula¬ 
tor works by interpolation to predict the galaxy correlation function 
for any given set of HOD parameters. Compared to the method we 
propose in this paper, the emulator can be extremely fast, since it 
only performs interpolations and avoids any calculation at the level 
of dark matter haloes. In principle, the emulator can be generalized 
to interpolate among the one-halo and two-halo component contri¬ 
butions to the 2PCEs. However, by construction, the emulator only 
operates with a certain HOD form and within a certain range of 
HOD parameters for the interpolation to work and for the accuracy 
to be under control. The method we propose performs direct cal¬ 
culations with clear physical meanings based on halo properties, 
and therefore it does not suffer from the above restrictions of an 
emulator. 

With increasingly more precise measurements of galaxy clus¬ 
tering from forthcoming large galaxy surveys, such as DESI 
(Levi et al. 2013) and Euclid (Laureijs et al. 2011), we expect that 
the accurate and efficient modelling method introduced in this work 
and its generalizations will have great potentials and wide applica¬ 
tions. 
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APPENDIX A: COVARIANCE MATRIX WITH MODEL 
UNCERTAINTY 

Let us consider the case that we use a model built on one simula¬ 
tion in a volume Vm (‘m’ for model) to interpret the observation 
obtained from a survey volume Vo (‘o’ for observation). What co- 
variance matrix should we use to model the data? The covariance 
matrix estimated for the observation tells us the covariance in the 
observational data. However, the model is based on a simulation 
with a finite volume, and therefore it is not the global model or the 
model from ensemble average. The model itself has uncertainty, 
and the modelling needs to account for this. To derive the effec¬ 
tive covariance matrix C®® to be used in the modelling, let us de¬ 
fine the i-th data point measured in the observational volume Vo 
as F^°, the i-th data point from the model with simulation volume 
Via as , and the global averages (or the ensemble averages) of 
the observational and model data points as Fo,i and Fm,;, respec¬ 
tively. Note that for an accurate model that reflects the reality, we 
have Fin,i — Fo,i. That is, the global model reproduces the global 
average observation. 

The effective covariance matrix with model uncertainty in¬ 
cluded is then 


m - f:-) {F^:^ - F^)) 

(Al) 

{[{F^J - Fo,i) - {F^':i - Fia,i)] 


[{F^j - Fo„) - {F^- - Fiaj)]) 

(A2) 

1 

-° i: 

'-i.o 

1 
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(A3) 


The symbol () denotes global/ensemble average over observations 
in volumes of Vo and over models in volumes of Kn- From (Al) 
to (A2), we make use of the above Fln.i = -Fo.i relation. In (A3), 
the first term is the element CY^ of the covariance matrix for the 

‘‘J 

measurements in volume Vo, the second term is the element CYT 

•‘J 

of the covariance matrix for the measurements in volume Vm (since 
the model values can be regarded as mock measurements), and both 
the third and fourth terms are zero (since there is no correlation 
between observation measurements and mock measurements). We 
then have 



(A4) 


and the result is expected and intuitive. 

For power spectrum or 2PCF measurements, the covari¬ 
ance matrix element is inversely proportional to the volume 
(Feldman et al. 1994; Tegmark 1997). We can express the effective 
covariance matrix in equation (A4) in terms of the one estimated 
for the observation and the relative volume of the simulation and 
observation. 



(A5) 
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